Proper Loss Functions for Nonlinear Hawkes Processes

نویسندگان

Aditya Krishna Menon

Young Lee

چکیده

Temporal point processes are a statistical framework for modelling the times at which events of interest occur. The Hawkes process is a well-studied instance of this framework that captures self-exciting behaviour, wherein the occurrence of one event increases the likelihood of future events. Such processes have been successfully applied to model phenomena ranging from earthquakes to behaviour in a social network. We propose a framework to design new loss functions to train linear and nonlinear Hawkes processes. This captures standard maximum likelihood as a special case, but allows for other losses that guarantee convex objective functions (for certain types of kernel), and admit simpler optimisation. We illustrate these points with three concrete examples: for linear Hawkes processes, we provide a least-squares style loss potentially admitting closed-form optimisation; for exponential Hawkes processes, we reduce training to a weighted logistic regression; and for sigmoidal Hawkes processes, we propose an asymmetric form of logistic regression. Introduction Temporal point processes are a classical statistical framework for modelling the times at which certain events of interest occur, such as failure times of a hard drive or the impact times of an earthquake (Cox and Isham 1980; Daley and Vere-Jones 2003). The simplest incarnation of these models is the Poisson process, which assumes the times between successive events are independent, and the number of events occurring in a time window follows a suitable Poisson distribution (Kingman 1993). Such models are a core tool in queuing theory (Erlang 1909; Kendall 1953). Despite their versatility, Poisson processes have an important limitation: they are incapable of modelling selfexcitation, wherein the occurrence of one event increases the likelihood of further events. This characteristic is present in many real-world phenomena, such as the occurrence of an earthquake triggering an aftershock. The Hawkes process (Hawkes 1971; Laub, Taimre, and Pollett 2015) is an important extension of the classical Poisson process to allow for such “burstiness”. The model has been applied in fields ranging from seismology (Ogata 1988), finance (Bowsher 2007; ∗Work conducted while at Data61. Copyright c © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Hardiman, Bercot, and Bouchaud 2013), and social media (Crane and Sornette 2008; Zhou, Zha, and Song 2013). Given historical data of event times, the standard way to fit a Hawkes process is to maximise its log-likelihood (Ozaki 1979). This approach is appealing owing to its conceptual simplicity; however, for a generic nonlinear Hawkes process (defined formally in the next section), the resulting objective may be non-convex. Further, even for linear Hawkes processes, optimisation of the likelihood requires an involved iterative optimisation. This raises a natural question: how might we design other losses for training Hawkes processes that have favourable properties compared to the likelihood? In this paper, we provide a framework to design loss functions for (non-)linear Hawkes processes. Specifically, given a particular choice of nonlinearity, we provide loss function that is suitable for estimating the parameters of the corresponding nonlinear Hawkes process, and which is further convex given a particular structure on the kernel. We study three concrete instantiations of this framework: (a) for linear Hawkes processes, we propose a loss with a potential closed-form solution; to our knowledge, the only extant closed-form solution for Hawkes processes arises in EM training (Lewis and Mohler 2011). (b) for exponential Hawkes processes1, we establish the suitability of the logistic loss, which allows us to reduce training to a logistic regression problem. (c) for sigmoidal Hawkes processes, we show the viability of a modified logistic regression objective, which provides a convex objective for training. At a technical level, our proposal rests upon three simple observations: (1) the Hawkes likelihood can be interpreted as a binary classification objective; (2) the asymptotic optimiser of the likelihood is a scaled density estimate; and (3) the broader family of proper losses (Buja, Stuetzle, and Shen 2005) retains this optimal solution, and thus also the fundamental target of interest. While these observations are conceptually simple, the explication of their connections for fitting Hawkes processes is to our knowledge novel, and their implications we believe of interest. 1The term “exponential Hawkes” is sometimes used to mean a linear Hawkes process with exponential kernel. We use the term to mean a Hawkes process with exponential link, but arbitrary kernel. Background Our framework requires some background on temporal point processes, as well as loss functions for binary classification. A glossary of important symbols is provided in Table 1. Temporal point processes Temporal point processes model the times at which events of interest occur via a stochastic process (Nt )t≥0, where Nt−Ns measures the number of events that occur in the time interval (s, t]. We focus on two such processes. Inhomogeneous Poisson process Fix some locally integrable λ : R+ → R+, and for any 0 ≤ s < t, let Λ(s, t) . = ∫ t s λ(x) dx. An inhomogeneous Poisson process (IPP) with intensity λ(·) satisfies (Daley and Vere-Jones 2003): (a) N0 = 0 almost surely (b) for any s < t, Nt − Ns ∼ Poisson(Λ(s, t)) (c) for any s < t ≤ s′ < t ′, Nt − Ns ⊥ Nt′ − Ns′ . Condition (a) posits that events occur strictly after time 0. Condition (b) posits that the number of events in any interval has a Poisson distribution, with mean given by the integrated intensity in that interval. Condition (c) posits the the number of events in two disjoint intervals is independent. IPPs may also be understood as the following generative model for event times: given some end time T , the number of events N is drawn from a Poisson with mean Λ(0,T), and the N event times are then drawn i.i.d. from a distribution P with density (Cox and Isham 1980, pg. 46) p(t) = λ(t)/Λ(0,T). (1) Given a history T . = {tn} n=1 of event times, suppose we seek an intensity from a family {λ(·; θ) | θ ∈ Θ} for suitable parameter space Θ. We may minimise the negative loglikelihood of θ, which for T . = maxn tn is (upto constants) (Daley and Vere-Jones 2003, Equation 2.1.9)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large Deviations for Markovian Nonlinear Hawkes Processes

In the 2007 paper, Bordenave and Torrisi [1] proves the large deviation principles for Poisson cluster processes and in particular, the linear Hawkes processes. In this paper, we prove first a large deviation principle for a special class of nonlinear Hawkes process, i.e. a Markovian Hawkes process with nonlinear rate and exponential exciting function, and then generalize it to get the result f...

متن کامل

Isotonic Hawkes Processes

Hawkes processes are powerful tools for modeling the mutual-excitation phenomena commonly observed in event data from a variety of domains, such as social networks, quantitative finance and healthcare records. The intensity function of a Hawkes process is typically assumed to be linear in the sum of triggering kernels, rendering it inadequate to capture nonlinear effects present in real-world d...

متن کامل

COVARIANCE MATRIX OF MULTIVARIATE REWARD PROCESSES WITH NONLINEAR REWARD FUNCTIONS

Multivariate reward processes with reward functions of constant rates, defined on a semi-Markov process, first were studied by Masuda and Sumita, 1991. Reward processes with nonlinear reward functions were introduced in Soltani, 1996. In this work we study a multivariate process , , where are reward processes with nonlinear reward functions respectively. The Laplace transform of the covar...

متن کامل

Process-level Large Deviations for Nonlinear Hawkes Point Processes

In this paper, we prove a process-level, also known as level-3 large deviation principle for a very general class of simple point processes, i.e. nonlinear Hawkes process, with a rate function given by the process-level entropy, which has an explicit formula.

متن کامل